feat(core): atomic write rollout for credentials, memory, config, JSONL (closes #3681, #4095 Phase 2)#4333
feat(core): atomic write rollout for credentials, memory, config, JSONL (closes #3681, #4095 Phase 2)#4333doudouOUC wants to merge 29 commits into
Conversation
Sync mirror of atomicWriteFile for code paths that can't await (settings persistence on exit, sync config writers). Same semantics: symlink chain resolution, permission preservation, fsync via flush:true, EPERM/EACCES rename retry, EXDEV fallback to direct write. Add forceMode option on AtomicWriteFileOptions — when true, ignore the existing target's permission bits and apply options.mode regardless. Needed for credential files that must heal historically over-permissive files (e.g. a 0o644 token restored from backup must be forced to 0o600). Honored by both async and sync paths. Default false preserves existing behavior. Reuses Atomics.wait for true blocking sleep in renameWithRetrySync — no busy-wait, no extra dep. Refs: #4095 Phase 2
…ier 1) Route all OAuth credential persistence through atomicWriteFile with forceMode: true, so a process crash mid-write cannot leave the user with a half-written token file, and historically over-permissive files (e.g. 0o644 from a manual restore) are healed to 0o600 on the next write. - oauth-token-storage.ts: setCredentials, deleteCredentials - file-token-storage.ts: saveTokens (encrypted MCP token storage) - qwenOAuth2.ts: cacheQwenCredentials (also fixes missing mode — was inheriting 0o644 from umask, now forced to 0o600) - sharedTokenManager.ts: saveCredentialsToFile — drops ~15 lines of hand-rolled tmp + rename in favor of the shared helper Lock-file writes using flag: 'wx' (sharedTokenManager.ts:720) are intentionally left untouched — they rely on exclusive-create semantics that atomic write does not preserve. Tests updated to mock atomicWriteFile instead of fs.writeFile. Refs: #4095 Phase 2
…Tier 2) Route all auto-memory state persistence through atomicWriteFile so a process crash during a dream/extract/forget cycle cannot corrupt the metadata sidecar, extraction cursor, or topic body files. Touched: manager (writeDreamMetadata), extract (writeExtractCursor + bumpMetadata), indexer (rebuild), dream (bumpDreamMetadata), forget (bumpMetadata + topic body rewrite). manager.ts:362 acquireDreamLock uses flag: 'wx' for exclusive create — left untouched, atomic write does not preserve that semantic. Uses atomicWriteFile (not atomicWriteJSON) to preserve the trailing newline these files have always had. Refs: #4095 Phase 2
…4095 Tier 3a) Route the remaining state-file write paths through atomic helpers so a crash mid-write cannot corrupt config, log, or session-scoped state: - trustedFolders.ts (sync): atomicWriteFileSync — sole path that flips workspace trust, must not half-write - logger.ts (4 sites): atomicWriteFile — full-file JSON rewrites for logs.json and per-checkpoint files - tipHistory, installationManager, projectSummary, todoWrite, trustedHooks: bonus sites with the same shape (state JSON written multiple times per session) todoWrite is on the hot path — writes every time the todo list mutates — so the added rename + fsync cost is measurable (a few ms per write on SSD). Trade-off accepted to avoid a half-written todos file silently breaking the next session's resume. Export atomicWriteFile / atomicWriteFileSync from the core public index so CLI-side callers (trustedFolders, tipHistory) can reach them. Tests updated: - logger.test.ts uses vi.importActual to re-export the real helper and override per-test via vi.mocked(atomicWriteFile).mockRejectedValueOnce - trustedFolders.test.ts and todoWrite.test.ts mock the helper directly Refs: #4095 Phase 2
#3656 fixed the read side of glued '}{' JSONL records — when a process was killed mid-appendFile, the trailing '\n' was lost and the next record was concatenated. The write side was left for a follow-up (#3681). This adds flush:true (fsync) to every per-line append: - jsonl-utils.ts writeLine / writeLineSync (session transcripts, auto-titles, prompt history) - debugLogger.ts appendFile (per-session debug log) jsonl-utils.ts write() (full-file replace) now goes through atomicWriteFileSync so a crash during overwrite cannot corrupt the session transcript either. Trade-off: fsync on every append adds disk-sync latency (single-digit ms on SSD, more on spinning disk / network FS). Acceptable for a few writes per turn; the alternative is silently losing the last record of every interrupted session, which #3681 explicitly flagged. Refs: #4095 Phase 2 Tier 3b Closes: #3681
Catch up two sites where Claude Code's equivalent path is atomic but qwen-code's isn't (verified against /Users/jinye.djy/Projects/claude-code on 2026-05-19): - extension/extensionManager.ts:533, :1073 — enablement config and install metadata writes. Claude Code's plugin install-counts and zip cache use atomic temp+rename via writeFileSyncAndFlush_DEPRECATED. - lsp/NativeLspService.ts:1351 — applying an LSP edit to a user file. Claude Code's FileWriteTool/FileEditTool both route through atomic writeTextContent → writeFileSyncAndFlush_DEPRECATED. A bare writeFileSync here could half-write the user's source file if the process is killed during an LSP-driven rename or quick-fix. Also clean up stale fs.rename mock setups in sharedTokenManager.test.ts that became no-ops after Tier 1 migration (rename is no longer called by saveCredentialsToFile). The fs.writeFile mocks stay because the wx-flag lock path still uses them. Refs: #4095 Phase 2
- packages/core/src/index.ts: move atomicFileWrite export to its alphabetical position (before browser.js) - tipHistory.ts: add forceMode: true to atomicWriteFileSync for consistency with other 0o600 sites — heals legacy 0o644 files even though tips are non-critical Refs: #4095 Phase 2
Three issues caught by post-merge Codex review of the #4095 Phase 2 branch — none had user-visible symptoms yet but all were latent bugs. 1. atomicFileWrite: forceMode without mode silently downgraded perms `if (!options?.forceMode)` skipped the existing-mode stat whenever forceMode was true, regardless of whether `mode` was also supplied. Calling `atomicWriteFile(p, data, { forceMode: true })` (no mode) on an existing 0o600 file produced 0o644 (umask default) instead of preserving 0o600. Tightened the guard to also require `options.mode` to be defined; mirrored fix in atomicWriteFileSync. Added two regression tests (async + sync) that assert mode preservation. 2. logger.test.ts: vi.resetAllMocks() blanked the atomicWriteFile shim The vi.fn(actual.atomicWriteFile) factory implementation gets reset to a no-op by `vi.resetAllMocks()` in beforeEach, which would make `logger.initialize()` silently skip creating logs.json on disk. Tests passed by coincidence (file pre-existence from prior runs). Captured the real implementation at module load and re-attach it via `mockImplementation` after each reset. 3. NativeLspService.applyTextEdits: atomic write bypassed file unwritability The read catch swallowed every error and treated it as "new empty file". With atomic write (tmp + rename), an unreadable target on a writable parent could be replaced with edits applied to an empty buffer — the old fs.writeFileSync would have errored on the target permission. Now only ENOENT is treated as new-file; other read errors (EACCES, EISDIR, etc.) propagate. Refs: #4095 Phase 2
The previous fix only handled "read failed → propagate the error". Codex round 2 caught the remaining gap: a file that's readable but chmod 0444 (read-only) would still be replaced by the atomic rename, because rename only needs parent-directory write access. Add an explicit fs.accessSync(W_OK) check before the atomic write. ENOENT is allowed through so LSP can still create new files via edits. Refs: #4095 Phase 2
…nd 3)
`saveCredentialsToFile` wrapped `atomicWriteFile` in `withTimeout(5000)`.
If the call hits the 5s budget (e.g. slow NFS home, network-backed
storage, fsync added by Phase 2), withTimeout rejects but the
atomicWriteFile internal write+rename keeps running unobserved:
1. withTimeout rejects → saveCredentialsToFile throws
2. performTokenRefresh `finally` releases the refresh lock
3. Another process acquires the lock and writes newer credentials
4. The original atomicWriteFile finally completes its rename and
overwrites the newer credentials — silent token rollback
Pre-migration the code awaited the tmp write and the rename in two
separate withTimeout calls; a timed-out tmp write never reached the
rename so there was no race against the target file. The migration
collapsed both into one inseparable atomicWriteFile, which made the
timeout actively unsafe (the work cannot be cancelled after the
timeout fires — fs.rename is not abortable).
Atomic write is durable by design — accept the I/O latency. The
mkdir and stat timeouts are kept (idempotent and read-only
respectively, no corruption risk on late completion).
Refs: #4095 Phase 2
There was a problem hiding this comment.
Pull request overview
This PR expands the atomic file-write rollout across core persistence paths, focusing on credential safety, JSONL/session durability, memory metadata, config/state files, and LSP-driven edits.
Changes:
- Adds synchronous atomic write support and
forceModepermission tightening. - Migrates high-value bare writes/appends to atomic writes or flushed appends.
- Updates affected unit tests and mocks to validate the new write paths.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
packages/core/src/utils/atomicFileWrite.ts |
Adds forceMode, sync atomic write helper, and sync rename retry support. |
packages/core/src/utils/atomicFileWrite.test.ts |
Adds sync helper and forceMode regression coverage. |
packages/core/src/utils/jsonl-utils.ts |
Flushes JSONL appends and atomically rewrites full JSONL files. |
packages/core/src/utils/debugLogger.ts |
Flushes debug log appends. |
packages/core/src/utils/debugLogger.test.ts |
Updates append expectations for flushed writes. |
packages/core/src/utils/projectSummary.ts |
Uses atomic write for welcome-back state. |
packages/core/src/utils/installationManager.ts |
Uses sync atomic write for installation ID persistence. |
packages/core/src/core/logger.ts |
Uses atomic writes for log and checkpoint JSON files. |
packages/core/src/core/logger.test.ts |
Mocks atomic writes while preserving real disk behavior by default. |
packages/core/src/qwen/qwenOAuth2.ts |
Writes cached Qwen credentials atomically with restricted mode. |
packages/core/src/qwen/qwenOAuth2.test.ts |
Adds atomic write mock. |
packages/core/src/qwen/sharedTokenManager.ts |
Replaces manual temp/rename credential write with atomic helper. |
packages/core/src/qwen/sharedTokenManager.test.ts |
Updates credential save mocks. |
packages/core/src/mcp/oauth-token-storage.ts |
Stores MCP OAuth token files with atomic restricted writes. |
packages/core/src/mcp/oauth-token-storage.test.ts |
Updates tests for atomic token writes. |
packages/core/src/mcp/token-storage/file-token-storage.ts |
Stores encrypted token file with atomic restricted write. |
packages/core/src/mcp/token-storage/file-token-storage.test.ts |
Updates encrypted token write expectations. |
packages/core/src/memory/manager.ts |
Writes auto-memory metadata atomically. |
packages/core/src/memory/indexer.ts |
Writes memory index atomically. |
packages/core/src/memory/forget.ts |
Writes memory metadata and topic files atomically. |
packages/core/src/memory/extract.ts |
Writes extraction cursor and metadata atomically. |
packages/core/src/memory/dream.ts |
Writes dream metadata atomically. |
packages/core/src/tools/todoWrite.ts |
Writes todo state atomically. |
packages/core/src/tools/todoWrite.test.ts |
Updates todo write tests for atomic helper. |
packages/core/src/lsp/NativeLspService.ts |
Applies LSP edits via sync atomic write with read/write error handling. |
packages/core/src/hooks/trustedHooks.ts |
Writes trusted hooks config atomically. |
packages/core/src/extension/extensionManager.ts |
Writes extension enablement and install metadata atomically. |
packages/core/src/index.ts |
Re-exports atomic file write utilities. |
packages/cli/src/services/tips/tipHistory.ts |
Writes tip history atomically with restricted mode. |
packages/cli/src/config/trustedFolders.ts |
Writes trusted folders config atomically with restricted mode. |
packages/cli/src/config/trustedFolders.test.ts |
Updates trusted folder save expectations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…bling-cook # Conflicts: # packages/core/src/qwen/qwenOAuth2.ts
Address PR review suggestions from wenshao (via qwen-latest /review): neither renameWithRetry/Sync nor the EXDEV cross-device fallback had direct test coverage. Both paths are critical (Windows AV contention, Docker tmpfs /tmp) and a regression would degrade silently. Vitest can't spy on ESM exports of `node:fs` (`Cannot redefine property: renameSync`), so add narrow internal test seams instead: - renameWithRetry / renameWithRetrySync take an optional `_renameImpl` parameter, defaulting to fs.rename / fs.renameSync. - atomicWriteFile / atomicWriteFileSync take an optional `_testFs` parameter with `rename` and `writeFile` overrides, forwarded to the retry helper and used in the EXDEV fallback branch. The seams are underscore-prefixed and JSDoc-tagged as "Internal test seam — production callers never pass this", which keeps the public API clean while making the behavior testable. New coverage (+9 tests, 36 → 45): - renameWithRetry: retry-EPERM-then-succeed, give-up after retries, no-retry on non-retryable (ENOSPC) - renameWithRetrySync: same 3 patterns (EACCES, EPERM exhausted, EINVAL) - EXDEV fallback: async direct write + tmp cleanup, sync ditto, non-EXDEV failure propagates without fallback (rejects EIO + tmp cleanup) Refs: #4333, #4095 Phase 2
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…sh:true CI failure on all 3 OSes (macos / ubuntu / windows): sdk.test.ts asserted `fs.appendFile` was called with `'utf8'` as the 3rd positional argument, but commit b7badc7 (#4095 Tier 3b — JSONL fsync) changed the `debugLogger.ts` appendFile call from string-form to options-form `{ encoding: 'utf8', flush: true }` to enable per-line fsync. Update the 3 assertions in the telemetry diagnostics test to match the new shape. No behavior change — debugLogger still flushes per append; only the assertion in this previously-unrelated suite needed updating. Refs: #4333, #4095 Phase 2 Tier 3b
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] qwenOAuth2.ts cacheQwenCredentials was de-migrated from atomicWriteFile back to a hand-rolled temp+chmod+rename pattern during merge conflict resolution (commit 7b1a7ae38). The PR description still lists qwenOAuth2 under Tier 1 migration and the behavior-changes table references qwen/qwenOAuth2.ts:982, but the file is no longer in the diff and line 982 now points to pollDeviceToken (the function starts at ~1060 after the merge).
The hand-rolled version uses a predictable temp path component (process.pid) versus atomicWriteFile's crypto.randomBytes(6), and lacks EPERM rename retry and EXDEV fallback. Consider either re-migrating to atomicWriteFile (consolidating onto the shared, tested helper) or moving qwenOAuth2 to the "Deliberately not in scope" section with a rationale and fixing the stale line reference.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
…4333 review) Address three review suggestions from wenshao (via qwen-latest /review), each pointing at a real coverage gap introduced by this PR: 1. NativeLspService.applyTextEdits error branches (round-2 LSP fix): the ENOENT-only read guard and the fs.accessSync(W_OK) refusal had no automated coverage. Added 3 tests accessing applyTextEdits via a typed cast (the method is private; making it public for one verification inflates API surface). Tests use chmod 0000 / chmod 0444 reproducers and assert (a) read failure propagates EACCES without silently overwriting with empty content, (b) W_OK rejects with EACCES/EPERM before the atomic rename touches the target, (c) nonexistent files are still accepted so LSP can create via edits. 2. atomicWriteFileSync non-EXDEV rename failure cleanup: the async counterpart had an explicit EIO-rename test asserting tmp cleanup; the sync variant did not. Added the mirror — injects a sync rename throwing EIO via the existing _testFs seam and asserts `readdirSync(tmpDir).length === 0`. 3. jsonl-utils writeLine / writeLineSync / write smoke tests: the three write paths are the core fix for #3681 (the PR's headline goal) but downstream callers (chatRecordingService, sessionService) mock them entirely. Without direct unit tests, a regression that dropped `flush: true` or reverted `write()` to bare writeFileSync would go undetected. Added 3 real-fs roundtrip tests. Test count delta: - NativeLspService.test.ts: 15 → 18 - atomicFileWrite.test.ts: 45 → 46 - jsonl-utils.test.ts: 22 → 25 Refs: #4333, #4095 Phase 2
|
Re #4333 (review) ( Updated the PR body to remove the stale Why not re-migrate: upstream PR #4255 added The trade-off is real: upstream's hand-roll loses EPERM retry + EXDEV fallback + symlink chain resolution vs the shared helper. That gap is worth a follow-up that extends |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review
Two PR #4333 round-8 review items: 1. The `catch {}` around `fd.chmod(desiredMode)` (and `fchmodSync`) was intended to tolerate filesystems without POSIX permissions (FAT/exFAT) but silently swallowed every error code: EPERM under a hardened sandbox, EIO on flaky NFS/CIFS, EROFS if the filesystem is remounted read-only mid-write. Because this path operates on the *final credential file* — not a tmp file — a silent fchmod failure leaves the target at the umask-masked open() mode with no diagnostic trail. Narrowed the catch to ENOSYS/ENOTSUP so the FAT/exFAT case still tolerates failure but security-relevant errors propagate. 2. The round-7 noFollow-EXDEV-new-file tests asserted the final mode (0o600) but didn't verify the *mechanism*. Under typical umask 0o022, `open(O_EXCL, 0o600)` already creates the file at 0o600, so a regression that swapped `fd.chmod()` back to a path-based `tryChmod(targetPath)` (the pre-fix TOCTOU-vulnerable form) would leave the mode assertion passing — defeating the round-7 fix undetected. Added a `chmod` test seam to `atomicWriteFile` / atomicWriteFileSync` (mirroring the existing `rename` / `writeFile` seams) and asserted that path-based chmod is never invoked against the credential target on this code path. 54/54 atomicFileWrite tests pass. Refs: #4333, #4095 Phase 2
…cover both branches Two PR #4333 round-9 review items, one critical correctness bug introduced by the round-8 catch narrowing: 1. **[Critical]** When `fd.chmod(desiredMode)` (or `fchmodSync`) on the noFollow EXDEV fallback throws a propagating error (EPERM under seccomp, EIO on flaky NFS, EROFS after a remount), the file created by `open(targetPath, O_CREAT|O_EXCL)` remained on disk after the error rethrew. Every subsequent credential write retry then hit EEXIST from O_EXCL and failed permanently — turning a transient sandbox EPERM into a permanent OAuth-refresh deadlock that requires manual file removal. All three credential write sites (`sharedTokenManager`, `oauth-token-storage` save+delete, `file-token-storage`) hit this code path. Fix: a `writeOk` flag plus nested try/catch — fd is closed in the inner finally, then if write/ sync/fchmod failed, the orphan is unlinked best-effort before the error rethrows. 2. The fchmod catch-narrowing (the headline behavior of round 8) had zero test coverage on either branch — `_testFs` exposed `rename`, `writeFile`, and path-based `chmod`, but no hook for the open-fd `fchmod`. A one-line revert from the narrowed catch back to `catch {}` would pass every existing test. Added `fchmod` field to `_testFs` (async + sync) and four tests: - ENOSYS swallowed → write succeeds (FAT/exFAT happy path) - EPERM propagates AND `targetPath` is absent (regression-tests #1) for both async and sync. 58/58 atomicFileWrite tests pass (was 54). Async + sync parity preserved. Refs: #4333, #4095 Phase 2
…n unlink Four PR #4333 round-10 review items, all small bounded fixes: 1. Parameterized the FAT/exFAT fchmod-swallow tests over both ENOSYS (Linux) and ENOTSUP (macOS). Round-9 only covered ENOSYS — a one-token regression dropping `ENOTSUP` from the catch condition would have passed every existing test. 2. The orphan-unlink catch block was empty (`/* best effort */`), leaving no diagnostic trail when the cleanup itself fails (EROFS, immutable flag, sandboxed container). Added a `createDebugLogger` import (`'ATOMIC_WRITE'` category) and a debug log so incident response can correlate the original write error with a subsequent EEXIST loop. Async + sync. 3. The pre-open `unlink(targetPath)` correctly propagates non-ENOENT errors (EACCES on parent dir, EROFS), but no test exercised that path. Added an `unlink` field to the `_testFs` seam (async + sync, matching the existing `rename` / `writeFile` / `chmod` / `fchmod` pattern) and two tests verifying EACCES propagates instead of getting hidden behind a downstream EEXIST from O_EXCL. 4. `resolveSymlinkChain(filePath)` ran before the function's main try-block, so symlink-resolution errors (EACCES on intermediate dir, ELOOP from circular chain) bypassed the `atomicWriteFile("path"): ...` annotation that every other failure path applies — leaving `err.path` referencing an internal intermediate directory the caller never asked about. Wrapped with `.catch(err => throw annotateWriteError(err, filePath))` (async) and the equivalent try/catch for sync. Added a real-fs ELOOP regression test for both variants (skipped on Windows). 64/64 atomicFileWrite tests pass (was 58). Refs: #4333, #4095 Phase 2
… + cover backoff/mkdir branches Five PR #4333 round-11 review items, all small and bounded: 1. **atomicWriteJSON option-type widening**: Was typed as the narrower `AtomicWriteOptions` (retries / delayMs only), so credential-grade options added in this PR (`mode`, `forceMode`, `noFollow`) and pre-existing ones (`flush`, `encoding`) were silently dropped at the type level even though the body spread them at runtime. A future maintainer calling `atomicWriteJSON(credPath, creds, {noFollow: true, mode: 0o600, forceMode: true})` would have typechecked but silently lost noFollow + forceMode + mode. Widened to `AtomicWriteFileOptions`. 2. **trustedHooks 0o600 + forceMode**: This file lists user-approved *executable hook commands* — strictly more sensitive than the sibling state files (`trustedFolders.json`, `tipHistory.json`) that already use `{mode: 0o600, forceMode: true}`. Was dropping to the process umask (0o644 by default), and a backup-restored looser mode was never healed. Now matches the sibling pattern. 3. **renameWithRetry exponential-backoff coverage**: Existing tests covered retry count and error propagation but not the `delayMs * 2 ** attempt` curve itself. A regression to linear, constant, or — worst — regressive backoff (which intensifies under Windows AV-scan stress) would have passed every existing test. Added a test using `vi.useFakeTimers()` that records gaps between mock-rename invocations and asserts `[delayMs, 2*delayMs, 4*delayMs]` for `(retries=3, delayMs=50)`. 4. **jsonl-utils write() parent-dir creation**: The other `write()` test targets a path inside the pre-created `tmpRoot`, so the `!existsSync(dir) → mkdirSync(dir, {recursive: true})` branch was never exercised. Added a one-liner test that targets a deeply nested non-existent path. 5. **writeLineSync docstring accuracy**: The docstring claimed "uses a simple flag-based locking mechanism (less robust than async version)" but there is no flag-based locking — and `writeLine` serializes via per-file `Mutex` that this function bypasses. Now accurately documents the lack of locking, the bypass, and the `flush: true` rationale (closes #3681). Test results: 65/65 atomicFileWrite tests pass (was 64), 26/26 jsonl-utils tests pass (was 25). Typecheck clean. Refs: #4333, #4095 Phase 2
…ger format Two PR #4333 round-12 review items, both real bugs: 1. **flush:true was silently a no-op for string payloads on appendFile**. Node's C++ fast path (binding.writeFileUtf8) for string + utf8 + appendFile bypasses the JS-side flush/fsync logic entirely — empirically string+flush:true takes ~0.05ms/op (identical to no flush) while Buffer+flush:true takes ~4.9ms/op (91× slower, proving fsync only runs for Buffer payloads). The data still reaches the kernel page cache (the syscall is synchronous), so `kill -9` is fine, but power-loss durability — the actual #3681 guarantee — was silently absent. Fix: pass `Buffer.from(line, 'utf8')` to both writeLine (async) and writeLineSync. This forces the JS slow path that honors `flush: true` and actually fsyncs the file. Updated the JSDoc on both functions to document the C++ fast-path bypass so a future maintainer doesn't revert to the simpler string form. 2. **`debugLogger.debug` doesn't do printf substitution**. `debugLogger`'s `formatArgs` (debugLogger.ts:67-77) just joins args with spaces — no `util.format()`. The round-10 calls used `'orphan unlink failed for %s: %s'` which rendered the literal `%s` markers in the log: orphan unlink failed for %s: %s /path/to/target Error: EACCES instead of: orphan unlink failed for /path/to/target: Error: EACCES Switched both async and sync sites to template literals, matching every other `debugLogger` call site in the codebase. Refs: #4333, #4095 Phase 2
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen-latest-series-invite-beta-v36 via Qwen Code /review
PR #4333 验证报告PR: 结论 —— ✅ 建议合并PR 的每一项功能声明都已通过真实构建、完整自动化测试套件、库级集成 harness、系统调用级 运行环境
1. 静态检查 —— 全部干净
2. 自动化测试 —— 全绿
3. 库级集成 harness —— 32/32 通过直接针对真实磁盘运行
¹ F1( 4. 系统调用级持久化取证(
|
| 路径 | string + flush:true |
Buffer + flush:true |
无 flush(对照) |
|---|---|---|---|
appendFile(异步) |
5 fsync | 5 fsync | 0 fsync |
appendFileSync |
5 fsync | 5 fsync | — |
writeFile(flag w,atomicWriteFile 临时文件所用) |
5 fsync | 5 fsync | 0 fsync |
真实 writeLine() |
— | 5 调用 / 5 fsync | — |
在 Node v22.22.2 与 v24.12.0 上结果一致。
- ✅
writeLine()每次 append 都 fsync —— chore(core): jsonl reader/writer follow-ups from #3656 #3681 的持久化在内核边界得到确认。 - ✅
atomicWriteFile的临时文件在 rename 前确实 fsync(字符串负载,即凭证/配置/记忆的常见情形)—— 原子写入的持久化保证成立。 - ✅ 无
flush对照组为 0 fsync,证明那 5 次 fsync 确由flush:true引起。
5. 真实 CLI 端到端 —— tmux 下流式输出中途 kill -9
在 tmux 会话中运行 npm run dev -- -m qwen3-coder-plus(经 tsx 跑 worktree 源码):
| 步骤 | 结果 |
|---|---|
| 启动 dev 构建 | ✅ 干净 —— API Key | qwen3-coder-plus |
| 发送 1000 字提示,模型流式输出 | ✅ token 流式返回(↓ 26 tokens) |
在流式输出中途 kill -9 CLI 进程 |
✅ 进程瞬间终止 |
logs.json(原子化的 logger.ts) |
✅ 合法 JSON 数组,51 条 |
会话 .jsonl(writeLine) |
✅ 2 行,0 解析失败,0 粘连 }{ |
runtime.json |
✅ 合法 JSON |
| 重启 dev | ✅ 干净启动 |
/resume 选择器 |
✅ 被杀会话出现在列表(2 minutes ago · pr-4333) |
| 加载被杀会话 | ✅ 历史恢复,无崩溃 |
如实说明范围:流式中的 assistant 回合未落盘(kill 发生在该回合定型之前 —— 属预期)。本测试确认的是已写入内容未被损坏、且会话仍可恢复。「写入进行中被杀」的严格证明见 §3(strace)与 §4 的 E 组(kill 下写出 19782 行,全部完好)。
6. 提示性说明(非阻塞)
最后一个提交 ef6c08a04 称 Node「string + utf8 + appendFile 的 C++ 快路径会绕过 JS 侧的 flush 逻辑,使 {flush:true} 对字符串成为静默空操作」。该现象未能复现 —— 在 Node v22.22.2 与 v24.12.0 上,appendFile / appendFileSync / writeFile 对字符串负载均正常处理 flush:true(§4:各 5 fsync;无 flush 时为 0)。最可能的原因是:传入 {flush:true} 选项对象本身就已走 JS 慢路径。
这不是缺陷。 实际代码改动 —— 在 writeLine/writeLineSync 中改传 Buffer —— 行为等价(fsync 行为完全一致,已验证),是无害的前向兼容保险。只是该提交信息/代码注释的理由把一个版本相关的说法夸大了。可选项:软化注释措辞。不影响正确性,也不影响合并决定。
7. 给维护者的备注
- 真实 OAuth 刷新 + kill 未覆盖(测试环境用 API-key 鉴权,OAuth 路径处于休眠)—— 与 PR 自身「Not covered」一节一致。凭证写入的形态由单测覆盖(
oauth-token-storage28、file-token-storage16、sharedTokenManager32),并由 §3 G 组的noFollow符号链接防护测试覆盖。 - JSONL 的
flush:true每次 append 引入一次真实 fsync(每回合数毫秒)—— 一个已被接受、已记录的权衡,验证确实存在。 - PR 落后于
main但可干净合并(MERGEABLE),且已包含一次对main的合并。 vscode-ide-companion/editorGroupUtils.ts中存在一处既有 eslint 警告 —— 该文件不在本 PR 改动范围内,与本 PR 无关。
验证方式:在 Linux x64 / Node 22 & 24 上构建并运行 PR 分支(ef6c08a04)—— 完整 core 测试套件、32 项集成 harness、strace 系统调用追踪,以及 tmux 下真实的 kill -9 CLI 会话。
yiliang114
left a comment
There was a problem hiding this comment.
Left one non-blocking correctness note from a local review pass.
…al write failures Three PR #4333 round-13 review items + one comment-wording softening: 1. **`tryChmod`/`tryChmodSync` catch narrowed (wenshao)** — was bare `catch {}` swallowing all errors, while the round-8 `fchmod` catch was already narrowed to ENOSYS/ENOTSUP. Same security rationale applies — and *specifically* on the EXDEV non-noFollow fallback, `tryChmod(targetPath)` is the *sole* mode-setting mechanism for an existing target (writeFile ignores `mode` when the target exists), so a silent EPERM/EROFS would leave credentials at the old mode. Non-credential callers don't pass `mode` → `desiredMode === undefined` short-circuits, so they're unaffected. 2. **Sync EXDEV `writeSync` → `writeFileSync(fd)` (yiliang114)** — `fsSync.writeSync(fd, buf)` returns bytes-actually-written and can short-write; the current code ignored the return, so a partial write would silently truncate the credential file with the call still returning success after fsync+fchmod. Switched to `fsSync.writeFileSync(fd, buf)` which loops internally per Node spec. The async sibling (`fd.writeFile`) already handles short-writes; this brings sync parity. 3. **`file-token-storage` failure-path coverage (wenshao)** — both `setCredentials` and `deleteCredentials` propagate `atomicWriteFile` rejections (no try/catch around the call), but no test exercised that path. Added two tests mirroring the round-1 sharedTokenManager precedent: ENOSPC on `setCredentials` and EROFS on `deleteCredentials` both rethrow. 4. **Round-12 comment wording softened (wenshao verification report)** — strace on Node v22/v24 confirms string + utf8 + flush:true does fsync correctly today, counter to my round-12 "silent no-op" framing. Buffer is still the safer documented form (forward-compat insurance against any future fast-path optimization), but the commit's claim that it was *fixing* a confirmed bug overstated what reproduces. Reframed both writeLine and writeLineSync comments accordingly without changing the code behavior. Test results: 109/109 affected suites pass (atomicFileWrite 65, jsonl-utils 26, file-token-storage 18). Broader credential/state suites also green: 216/216 across sharedTokenManager + oauth-token-storage + qwenOAuth2 + logger + trustedFolders. Typecheck clean. Refs: #4333, #4095 Phase 2
|
非常感谢这份覆盖深度——尤其是 strace 这层确实把 #3681 的持久化论证从「测试通过」提升到了「内核边界可证」。Linux x64 + Node 22 那一栏正好补上我之前 macOS arm64 + Node 24 的覆盖空缺。 关于 §6 那个提示性说明,你说得对——string + utf8 + flush:true 在 v22/v24 上确实如你的 strace 所示正常 fsync。round-12 那次提交把它说成「silent no-op」过头了。我刚才在 b52eeaf 里把 也再次校对了下其他细节都已对齐你的报告:
Round 13 同时还采纳了你和 yiliang114 的 3 条 inline 建议:
如果还有其他你扫到但报告里没单独列的小项,直接 inline comment 就行,辛苦了 🙏 |
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] saveCredentialsToFile at packages/core/src/qwen/sharedTokenManager.ts calls atomicWriteFile successfully (credentials persisted), then calls fs.stat (wrapped in withTimeout(5000)) to update memoryCache.fileModTime. If fs.stat fails (slow NFS, transient mount issue), the catch block throws TokenManagerError(FILE_ACCESS_ERROR, "Failed to write credentials file: ..."). The error message is misleading: the credentials were successfully written. The failure is only in the cache-update step. In a retry loop, this could trigger an unnecessary token refresh storm. Consider wrapping the fs.stat in its own try/catch that logs but does not propagate the error — memoryCache.fileModTime staleness is self-healing on next read.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
…dHooks tests Three PR #4333 round-14 review items: 1. **Path-level tryChmod / tryChmodSync catch narrowing had zero direct coverage**. Round-13 narrowed the bare `catch {}` to ENOSYS/ENOTSUP only (matching the round-8 fd-level fchmod narrowing), but every existing EXDEV test passed `options: undefined`, so `desiredMode === undefined` short-circuited before any chmod was attempted. A regression that inverted the catch condition (missing `!` prefix) would silently swallow EPERM on every sync credential write (trustedHooks, tipHistory, trustedFolders, etc.) with zero diagnostic signal. Added 6 tests via the existing `_testFs.chmod` seam: parameterized ENOSYS/ENOTSUP swallow + EPERM propagation, for both async (`atomicWriteFile`) and sync (`atomicWriteFileSync`). 2. **Orphan-cleanup unlink on the noFollow EXDEV failure path was using raw `fs.unlink` / `fsSync.unlinkSync` instead of the injected `unlinkImpl` seam**. The pre-open unlink correctly used the seam, but the round-9 orphan cleanup added later bypassed it, making it the only fs operation in `atomicWriteFile` not flowing through the test seam. Routed both async and sync orphan cleanup through `unlinkImpl`, and added 2 tests that inject a spy and assert orphan cleanup is invoked against targetPath after a simulated fchmod EPERM. 3. **`trustedHooks.ts` had no test coverage**. Round-11 migrated it to `atomicWriteFileSync` with `{ mode: 0o600, forceMode: true }` — strictly the most security-sensitive write in the PR since the file stores user-approved executable hook commands — but unlike the sibling files (trustedFolders, tipHistory) it had no test file. A regression that dropped `forceMode: true` or weakened the mode would have passed all existing tests. Created `trustedHooks.test.ts` covering: write goes through atomicWriteFileSync with `{ mode: 0o600, forceMode: true }`, write targets the global qwen dir path, the persisted content matches the hook key derived from the hook config, and round-trip trust/untrust behavior. Test results: 73/73 atomicFileWrite tests (was 65) + 5/5 trustedHooks (new). Typecheck clean. Refs: #4333, #4095 Phase 2
| throw unlinkErr; | ||
| } | ||
| } | ||
| const fd = await fs.open( |
There was a problem hiding this comment.
[Suggestion] fs.open/fsSync.openSync with O_WRONLY | O_CREAT | O_EXCL is hardcoded here (and sync at L582), but the _testFs seam type does not include an open/openSync property. This is the syscall that actually enforces the noFollow guarantee on the EXDEV path — a regression that dropped O_EXCL (re-introducing the symlink TOCTOU) would pass every existing test because no test can force open to fail with EEXIST.
Consider adding open?: (path, flags, mode) => Promise<FileHandle> to the async _testFs type (and openSync? to the sync type), or document that O_EXCL coverage relies on the real-fs noFollow behavioral tests.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
…eanup through seam Two PR #4333 round-15 review items: 1. **`trustedHooks.ts` and `trustedFolders.ts` were missing `noFollow: true`**. The credential write sites (`sharedTokenManager`/`oauth-token-storage`/`file-token-storage`) all pass `{ mode: 0o600, forceMode: true, noFollow: true }` to prevent pre-placed symlink attacks. The trustedHooks comment already called it "strictly more sensitive than trustedFolders / tipHistory" — yet credential paths got symlink protection it didn't. A pre-placed symlink at `~/.qwen/trusted_hooks.json` (or `trustedFolders.json`) could redirect the atomic write to an attacker-controlled path, either leaking the executable-trust list / trusted-folder list, or leaving the user's real config silently stale. Added `noFollow: true` to both write sites and updated the assertions in `trustedHooks.test.ts` and `trustedFolders.test.ts`. 2. **Tmp-file cleanup at L240 (async) and L568 (sync) used raw `fs.unlink` / `fsSync.unlinkSync` instead of the injected `unlinkImpl` seam**. Pre-open unlink and orphan cleanup correctly routed through `unlinkImpl`, making the tmp-cleanup branch the only outlier — `_testFs.unlink`-injecting tests couldn't intercept this path, weakening the seam abstraction. Behavioral impact is nil (cleanup is best-effort, errors swallowed), but consistency matters for future test authors. Routed both async and sync variants through `unlinkImpl`. Test results: 99/99 affected suites pass (atomicFileWrite 73, trustedHooks 5, trustedFolders 21). Typecheck clean on core. Refs: #4333, #4095 Phase 2
Summary
Phase 2 of #4095 — replace the remaining bare
fs.writeFile/fs.writeFileSync/fs.appendFilecalls in security-sensitive and data-integrity paths with the atomic helpers introduced by Phase 1 (PR #4096). Also closes #3681 (JSONL session writer durability).A process killed mid-write —
kill -9, OOM, power loss, AV scan stalling rename, FS unmount mid-flush — could previously leave OAuth credentials, memory metadata, session transcripts, or trust-folder config half-written or with glued}{JSONL records. After this PR all of those go throughatomicWriteFile/atomicWriteFileSync(temp + fsync + rename + EXDEV fallback + EPERM retry).Ref: #4095 Phase 2. Closes: #3681.
What changed
Ten commits, each independently green:
Core migration (6 commits):
feat(core): add atomicWriteFileSync + forceMode option— new sync helper mirroring async API (flush:true, symlink chain, EXDEV fallback, EPERM retry viaAtomics.waitblocking sleep). NewforceModeoption ignores existing-target permissions and appliesoptions.moderegardless — needed for credentials so a historically over-permissive file (e.g. 0o644 restored from backup) gets healed to 0o600 on next write.refactor(core): migrate credential writes (Tier 1)—oauth-token-storage,file-token-storage,sharedTokenManager. All write with{mode: 0o600, forceMode: true}. (cacheQwenCredentialsinqwenOAuth2.tswas initially included but de-migrated during merge with upstream PR feat(serve): auth device-flow route (#4175 Wave 4 PR 21) #4255 — see "Deliberately not in scope" below.)refactor(core): migrate memory state writes (Tier 2)—memory/manager,extract,indexer,dream,forget. Trailing-newline preserved.refactor: migrate config + logger + state writes (Tier 3a)—trustedFolders(sync),logger(4 sites), plus state-file siblings the issue didn't list but match the same risk profile:tipHistory,installationManager,projectSummary,todoWrite,trustedHooks.fix(core): flush JSONL appends to disk (Tier 3b, closes #3681)—jsonl-utils.tswriteLine/writeLineSyncgetflush: true;write()full-file replace goes throughatomicWriteFileSync;debugLogger.tsappendFile getsflush: true.refactor(core): migrate extension config + LSP edit to atomic write— catch-up commit after auditing Claude Code's equivalents:extensionManager.tsenablement config + install metadata,NativeLspService.tsLSP-driven user-file edit.Cosmetic cleanup (1 commit):
7.
chore: cosmetic cleanups from PR review—index.tsexport ordering;tipHistory.tsaddsforceMode: truefor consistency with other 0o600 sites.Codex-review-driven fixes (3 commits): all three were latent bugs flagged across three Codex review rounds. None had user-visible symptoms yet — caught before merge.
8.
fix(core): address Codex review findings on Phase 2 PR— three latent bugs from round 1:modesilently downgraded perms.if (!options?.forceMode)skipped the existing-mode stat wheneverforceModewas true, regardless of whethermodewas also supplied. CallingatomicWriteFile(p, data, { forceMode: true })(no mode) on an existing 0o600 file produced 0o644 (umask default). Tightened guard to also requireoptions.mode; mirrored in sync; added regression tests that assert mode preservation.vi.resetAllMocks()blanked theatomicWriteFileshim.vi.fn(actual.atomicWriteFile)factory implementation gets reset to no-op byvi.resetAllMocks()inbeforeEach, which would makelogger.initialize()silently skip creatinglogs.json. Tests passed by coincidence (file pre-existence). Captured real implementation at module load and re-attached viamockImplementation.NativeLspService.applyTextEditsswallowed all read errors. A read failure (EACCES on read-protected file, EISDIR, etc.) was treated as "new empty file" and the atomic rename could replace the original. Now only ENOENT is treated as new-file; other errors propagate.fix(lsp): refuse LSP edits to chmod 0444 files (round 2)— even after the previous fix, a file that's readable butchmod 0444(read-only) would still be replaced by atomic rename (rename only needs parent-directory write access). Added explicitfs.accessSync(W_OK)check before the atomic write; ENOENT still allowed so LSP can create new files.fix(core): drop withTimeout around atomic credential write (round 3)—saveCredentialsToFilewrappedatomicWriteFileinwithTimeout(5000). If the call hit the budget (slow NFS, fsync stall), withTimeout would reject while the atomicWriteFile internal write+rename kept running unobserved → the refresh lock got released → another process could write newer credentials → the original atomicWriteFile finally completed its rename and silently overwrote the newer token. Atomic write is durable by design andfs.renameis not abortable — accept the I/O latency.mkdirandstattimeouts kept (idempotent and read-only respectively).mcp/oauth-token-storage.ts,mcp/token-storage/file-token-storage.ts,qwen/sharedTokenManager.tsforceMode: trueheals historically over-permissive token files to 0o600utils/jsonl-utils.tswriteLine/writeLineSyncflush: trueper append}{glued records fromkill -9sharedTokenManager.saveCredentialsToFilewithTimeoutaround the writeNativeLspService.applyTextEditsW_OKcheck before writechmod 0444files now throw EACCES (matches pre-Phase-2 behavior); LSP edits on read-protected files no longer silently overwrite with an empty bufferDeliberately not in scope
Audited the repo for other bare-write call sites; the following are intentionally left as-is. For each I checked Claude Code's equivalent path; verdicts in parentheses.
flag: 'wx':sharedTokenManager:708,memory/manager:363,memory/store:49,chatRecordingService:633,sessionService:937— exclusive-create semantics, must not become atomic. (Claude Code agrees.)qwen/qwenOAuth2.tscacheQwenCredentials— initially migrated as part of Tier 1, then de-migrated during merge with upstream PR feat(serve): auth device-flow route (#4175 Wave 4 PR 21) #4255 (the daemon device-flow registry work). Upstream addedAbortSignalthreading +SharedTokenManager.getInstance().clearCache()invalidation into the hand-rolled atomic write.atomicWriteFiledoesn't accept a signal (andfs.renameis not abortable), so re-migrating would silently drop signal cancellation — the exact race I just fixed in round 3 forsaveCredentialsToFile. Kept upstream's hand-rolled atomic, accept slightly weaker guarantees (no EPERM retry / EXDEV fallback) in exchange for correct cancellation semantics.services/gitService.ts,services/gitWorktreeService.ts— both planned for deprecation (shadow-git checkpointing is being replaced byfileHistoryServiceper PR feat(rewind): add file restoration support to /rewind command #4064; worktree path is also slated to be retired). Not worth investing atomic-write effort in code on the way out. Defer permanently.extension/extensionSettings.tsenv writes,extension/claude-converter.ts,extension/gemini-converter.ts,extension/variables.ts— Claude Code has no equivalent (no per-extension env, no converters). Defer.agents/agent-transcript.ts:127,agents/arena/ArenaManager.ts:1616— Claude Code's sub-agent metadata sidecars use bare writeFile too (write-once, re-derivable). Defer.core/prompts.ts:406prompt dump,utils/openaiLogger.ts,tools/shell.ts,utils/truncation.ts,tools/modifiable-tool.ts,core/config/config.ts:2600— transient / per-request / scratch files. Claude Code's equivalent debug-dump uses appendFile too. Defer.cli/src/serve/httpAcpBridge.ts:1274— already a hand-rolled atomic write withwxtmp + rename. Worth folding into the shared helper in a follow-up PR with BOM/encoding regression testing.cli/src/config/settings.ts:875recovery write-back — has its own backup machinery; touching it risks settings-migration regressions, defer.Test plan
Automated
npx vitest run packages/core/src/utils/atomicFileWrite.test.ts— 36 tests (19 async, 13 sync, 5 forceMode including 2 round-1 regression cases for the "no mode" edge)npx vitest run packages/core/src/utils/jsonl-utils.test.ts packages/core/src/utils/debugLogger.test.ts— 44 passlogger.test.ts52 pass —beforeEachre-binds the realatomicWriteFileaftervi.resetAllMocks()(fix from Codex round 1)trustedFolders.test.ts21,todoWrite.test.ts28,extensionManager.test.ts42, LSP suite 62 passnpx tsc --noEmit -p packages/core && npx tsc --noEmit -p packages/clicleanManual smoke verification — completed
Verified on macOS arm64, Node v24.12.0, against this branch (
worktree-ethereal-bubbling-cook). Combined library-level integration script (covers crash-safety claims that don't need a model) with a realnpm run devtmux run (proves the actual code paths users hit fire correctly under kill -9).Part 1 — library-level integration script (17/17 pass)
Direct exercise of
atomicWriteFile/atomicWriteFileSync/writeLine/writeLineSyncagainst real disk, including a kill -9 subprocess for the JSONL claim. Confirms every Codex-round fix is regression-proof at the helper level.verify-atomic-helpers.mjs — full script (click to expand)
Actual output (run on this branch):
Part 2 — real CLI end-to-end (tmux +
npm run dev)Validates the helper changes work inside an actual qwen-code session — specifically that
logger.ts(logs.json full-file rewrite) andjsonl-utils.writeLine(per-turn session transcript append) survive a hard process kill and/resumecan still read the JSONL. This is the closest reproduction of the real #3681 failure mode.Environment: API key auth (
selectedType: "openai"→ DASHSCOPE → glm-5). OAuth path is dormant for API-key users, so the high-value paths under load arelogger.tsandjsonl-utils.ts— exactly what's verified here.Commands (run from the worktree root):
Observed results:
glm-5shownkill -9 <leaf PID>mid-responseJSON.parse(logs.json)}{/resumepicker请用大约200字详细描述...hellovisibleCoverage matrix — PR claim → verification
atomicWriteFileis crash-atomic (write tmp + fsync + rename)atomicWriteFileSyncmirrors async semanticsforceMode: trueheals legacy 0o644 → 0o600forceModewithoutmodepreserves existing perms (Codex round-1 fix)flush: truesurvives kill -9 — no glued}{(closes #3681)chmod 0444(Codex round-2 fix)withTimeoutremoval eliminates token-overwrite race (Codex round-3 fix)sharedTokenManager.test.ts(31 pass)/rewind-style session resume still works after a kill/resumepicker + select)Not covered (out of scope for local verification)
~/.qwen/oauth_creds.jsonin the test environment held test fixtures, and the user's CLI uses API-key auth instead. The code path is identical to the verifiedatomicWriteFile(creds, {mode:0o600, forceMode:true})shape, just driven by the OAuth refresh callback rather than the logger.renameWithRetry/renameWithRetrySyncEPERM retry path is exercised by the existing unit tests but not under live AV pressure.Out-of-scope follow-ups
cli/src/serve/httpAcpBridge.ts:1274hand-rolled atomic write into the shared helper~/.qwen/settings.json— defense in depth, low priority🤖 Generated with Qwen Code